Goto

Collaborating Authors

 Miami



A Limitations and Societal Impacts

Neural Information Processing Systems

Limitations One limitation of our model is its potential for data bias. This could limit the applications of the model. MLLMs could be used to create fake news articles or social media posts. Hyperparameters Number of layers 24 Hidden size 2,048 FFN inner hidden size 8,192 Attention heads 32 Dropout 0.1 Attention dropout 0.1 Activation function GeLU [1] V ocabulary size 64,007 Soft tokens V size 64 Max length 2,048 Relative position embedding xPos [2] Initialization Magneto [3] Table 1: Hyperparameters of causal language model of K The detailed instruction tuning hyperparameters are listed in Table 3. The models are trained on web-scale multimodal corpora.



Why physical ID theft is harder to fix than credit card fraud

FOX News

Identity theft involving stolen driver's licenses creates lasting legal exposure unlike credit card fraud, as license numbers cannot be changed and require extensive cleanup efforts.






Zelenskyy warns of 'logistics terror' as Russia hits Ukraine railway

Al Jazeera

Zelenskyy warns of'logistics terror' as Russia hits Ukraine railway President Volodymyr Zelenskyy says he has ordered Ukraine's military leaders to respond after a spate of Russian attacks targeting railway infrastructure and logistics routes. His comments on Monday come after Russian forces stepped up attacks, including on a train last week that killed five people in a railway car in the eastern region of Kharkiv. Russian forces have prioritised the capture of train hubs, such as Kupiansk and Pokrovsk in eastern Ukraine. "The Russian army remains focused on terror against our logistics - primarily railway infrastructure," Zelenskyy said in a post on social media. "In particular there were strikes in the Dnipro region and in Zaporizhzhia, specifically targeting railway facilities."


Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

Zhou, Hongyi, Zhu, Jin, Xu, Erhan, Ye, Kai, Yang, Ying, Shi, Chengchun

arXiv.org Machine Learning

Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Y et, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLMgenerated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 57.8% to 80.6% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini). The past few years have witnessed the emergence and rapid development of large language models (LLMs) such as GPT (Hurst et al., 2024), DeepSeek (Liu et al., 2024), Claude (Anthropic, 2024), Gemini (Comanici et al., 2025), Grok (xAI, 2025) and Qwen (Y ang et al., 2025). Their impact is everywhere, from education, academia and software development to healthcare and everyday life (Arora & Arora, 2023; Chan & Hu, 2023; Hou et al., 2024). On one side of the coin, LLMs can support users with conversational question answering, help students learn more effectively, draft emails, write computer code, prepare presentation slides and more. On the other side, their ability to closely mimic human-written text also raises serious concerns, including the generation of biased or harmful content, the spread of misinformation in the news ecosystem, and the challenges related to authorship attribution and intellectual property (Dave et al., 2023; Fang et al., 2024; Messeri & Crockett, 2024; Mahajan et al., 2025; Laurito et al., 2025). Addressing these concerns requires effective algorithms to distinguish between human-written and LLM-generated text, which has become an active and popular research direction in recent literature (see Crothers et al., 2023; Wu et al., 2025, for reviews).